Advanced Computer Architecture by Junjie Wu & Lian Li

Advanced Computer Architecture by Junjie Wu & Lian Li

Author:Junjie Wu & Lian Li
Language: eng
Format: epub
Publisher: Springer Singapore, Singapore


4 Parallel LDA Training on GPU

4.1 Data Partition

In LDA algorithm, we need not consider the order of documents in a dataset and the order of words in a document. This provides a good theoretical basis to parallelize LDA algorithm. The common data partition scheme is to divide the documents into a number of partitions, and distributing the partitions to different nodes or cores, then all the nodes or cores update their data after synchronization and communication. However, as mentioned in Sect. 2, the number of words in one document may be several times of another document. For synchronization, we must wait for the node or core which has the documents with the most number of words. On GPU architecture, we also face this problem of load imbalance. Therefore, we propose a partition scheme that distribute the data evenly on the threads.

Our data partition scheme is motivated by the following observation: When we sample the documents dataset, we are sampling the words in it, since a document is a vector of words frequency in LDA. So we just put the words in different documents into the same dataset. In each iteration, we just count the number of words of the dataset, denoted as N, and distribute them over K threads. We do not consider the document subscript of the word, so each thread loads N / K words. In CUDA, a kernel can be executed by multiple equally-shaped blocks, and every block has its blockId; one block can have many threads, and every thread has its threadId [10, 11]. the number K of total threads in kernel is calculated by . We count the thread’s id with , every thread loads N / K words ranging form to .

Actually the data partition scheme may cause a problem that the words from one document may be distributed to different threads. So when different threads sample the data on GPU in parallel, it may cause writing conflict. Multiple threads may access the same value of document-topic matrix at the same time when they occasionally process the words of one document simultaneously. We call this document-topic conflict. Besides document-topic conflict, multiple threads may access the same value of the word-topic matrix or topic vector at the same time. When they occasionally process the same word or the same topic simultaneously, we call it word-topic conflict and topic-vector conflict respectively. This issue may lead to wrong inference results and operation failure. In this paper, we use atomic operation to solve this problem, and more details will be described in the next section.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Eco-friendly approach of bio-indigo synthesis and developing purification methods towards isolation of indigo from indirubin and bacterial fragments by Ramalingam Manivannan & Kaliyan Prabakaran & Young-A Son(206509)
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(174942)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(83330)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(83025)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(82845)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74436)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50892)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40259)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40215)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40094)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32730)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32504)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32452)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32384)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32360)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32330)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32256)
What's Done in Darkness by Kayla Perrin(27144)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26522)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26457)